Mining semantic networks of bioinformatics e-resources from the literature

نویسندگان

  • Hammad Afzal
  • James M. Eales
  • Robert Stevens
  • Goran Nenadic
چکیده

BACKGROUND There have been a number of recent efforts (e.g. BioCatalogue, BioMoby) to systematically catalogue bioinformatics tools, services and datasets. These efforts rely on manual curation, making it difficult to cope with the huge influx of various electronic resources that have been provided by the bioinformatics community. We present a text mining approach that utilises the literature to automatically extract descriptions and semantically profile bioinformatics resources to make them available for resource discovery and exploration through semantic networks that contain related resources. RESULTS The method identifies the mentions of resources in the literature and assigns a set of co-occurring terminological entities (descriptors) to represent them. We have processed 2,691 full-text bioinformatics articles and extracted profiles of 12,452 resources containing associated descriptors with binary and tf*idf weights. Since such representations are typically sparse (on average 13.77 features per resource), we used lexical kernel metrics to identify semantically related resources via descriptor smoothing. Resources are then clustered or linked into semantic networks, providing the users (bioinformaticians, curators and service/tool crawlers) with a possibility to explore algorithms, tools, services and datasets based on their relatedness. Manual exploration of links between a set of 18 well-known bioinformatics resources suggests that the method was able to identify and group semantically related entities. CONCLUSIONS The results have shown that the method can reconstruct interesting functional links between resources (e.g. linking data types and algorithms), in particular when tf*idf-like weights are used for profiling. This demonstrates the potential of combining literature mining and simple lexical kernel methods to model relatedness between resource descriptors in particular when there are few features, thus potentially improving the resource description, discovery and exploration process. The resource profiles are available at http://gnode1.mib.man.ac.uk/bioinf/semnets.html.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The CALBC RDF Triple Store: Retrieval over Large Literature Content

Integration of the scientific literature into a biomedical research infrastructure requires the processing of the literature, identification of the contained named entities (NEs) and concepts, and to represent the content in a standardised way. The CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation o...

متن کامل

Towards mature use of semantic resources for biomedical analyses

EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Use of semantic resources, such as ontologies, has been the key to the standardisation of IT solutions and their interoperability in recent years. Combining semantic resources with biomedical data analysis is developing into a dedicated research domain and stimulates related research in biomedical research fiel...

متن کامل

A Pattern Language for Knowledge Discovery in a Semantic Web context

Ontologies are used to represent data and share knowledge of a specific domain, and in recent years they tend to be used in many applications such as database integration, peer-to-peer systems, e-commerce, semantic web services, bioinformatics, or social networks. Feeding ontological domain knowledge into those applications has proven to increase flexibility and inter-operability and interpreta...

متن کامل

Text mining meets workflow: linking U-Compare with Taverna

UNLABELLED Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can...

متن کامل

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents

MOTIVATION Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009